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Abstract 

Background.-Jhough administrative databases are increasingly being used for research related to myocardial infarction (Ml), 
the validity of Ml diagnoses in these databases has never been synthesized on a large scale. 

Objective: Jo conduct the first systematic review of studies reporting on the validity of diagnostic codes for identifying Ml 
in administrative data. 

A7ef/jO£/5; MEDLINE and EMBASE were searched (inception to November 2010) for studies: (a) Using administrative data to 
identify Ml; or (b) Evaluating the validity of Ml codes in administrative data; and (c) Reporting validation statistics (sensitivity, 
specificity, positive predictive value (PPV), negative predictive value, or Kappa scores) for Ml, or data sufficient for their 
calculation. Additonal articles were located by handsearch (up to February 201 1) of original papers. Data were extracted by 
two independent reviewers; article quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies tool. 

Results: Thirty studies published from 1984-2010 were included; most assessed codes from the International Classification 
of Diseases (ICD)-9"^ revision. Sensitivity and specificity of hospitalization data for identifying Ml in most [>50%] studies was 
>86%, and PPV in most studies was >93%. The PPV was higher in the more-recent studies, and lower when criteria that do 
not incorporate cardiac troponin levels (such as the MONICA) were employed as the gold standard. Ml as a cause-of-death 
on death certificates also demonstrated lower accuracy, with maximum PPV of 60% (for definite Ml). 

Conclusions: Hospitalization data has higher validity and hence can be used to identify Ml, but the accuracy of Ml as a 
cause-of-death on death certificates is suboptimal, and more studies are needed on the validity of ICD-10 codes. When 
using administrative data for research purposes, authors should recognize these factors and avoid using vital statistics data 
if hospitalization data is not available to confirm deaths from Ml. 
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Introduction 

Cardiovascular diseases (CVD), including myocardial infarcdon 
(MI), are associated with physical disability, reduced quality-of-life, 
economic hardship, and death. In 2008 CVD accounted for 30% 
of all deaths globally [1], and annual cost estimates for CVD have 
recently exceeded €169 billion for the European Union [2] and 
$400 billion in the United States [3] . Although age is one of the 
primary risk factors for CVD, growing evidence suggests tliat 
chronic conditions including inflammatory rheumatic diseases [4- 
9], osteoarthritis [10], diabetes [11], and clinical depression [12] 
are also associated with an increased risk of CVD, independent of 
age. 

Alongside, there is increasing recognition of the value of 
administrative data for use in disease surveUlance [13-19], and this 



data source has been key in identifying the associations between 
chronic diseases and CVD as mentioned above. Administrative 
databases provide easy access to data for a large number of 
patients attending multiple centres, with longer follow-up periods 
at relatively low cost. For example, the universal provision of 
publicaUy-funded health care in Canada allows the patient-level 
linkage of health resource utilization data (including hospital 
separations, outpatient visits, procedures and tests, and, in some 
provinces, dispensed prescriptions) for nearly every resident of 
each province to demographic and vital statistics data. Conse- 
quently, both selection and recall bias are minimized. 

Despite these advantages, much uncertainty exists around the 
validity of diagnoses recorded in administrative data since most 
databases are not established for research purposes. Instead, 
records of each healthcare encounter are submitted by physicians 
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and hospital staff primarily to obtain reimbursement. Thus, not all 
conditions may be recorded in the databases, and those recorded 
may not correspond to the date of disease onset or reflect the true 
diagnosis and assessment made by the treating physician. These 
errors and inconsistencies in diagnostic codes may lead to 
misclassification bias, impacting the quality of resear[:h using 
these sources and, in turn, any changes in health policy and care 
practices stemming from it. For example, failure to adequately 
capture the number of people afflicted by CVD may underesti- 
mate the burden of these diseases, thus limiting the health 
resources allocated to address them. Alternatively, when studying 
long-term health outcomes, capturing an excess number of false- 
positive cardiovascular events could overestimate the risks 
associated with an otherwise beneficial therapy or intervention. 

While several assessments of the validity of cardiovascular codes 
have been published [20-23], most concerned a single CVD and 
were conducted within a limited geographic area, restricting their 
generalizal)ility. Much inconsistency exists with regards to the 
methods (including the source of the population and gold 
standards) adopted by these studies and the way in which results 
are reported. To our knowledge data on the vahdity of these codes 
have not yet been synthesized on a larger scale. 

As part of a Canadian Rheumatology' Network for establishing 
best practices in the use of administrative data for health research 
and surveillance (CANRAD) [13,19,24], our objective was to 
conduct a systematic review of studies reporting on the validity of 
diagnostic codes for identifying CVD in administrative data. Data 
from these studies were used to compare the validity of these 
codes, and to evaluate whether administrative health data can 
accurately identify CVD for the purpose of identifying these (■\ (:nts 
as covariates, outcomes, or complications in future research. We 
focus on MI in this paper, and will discuss two other CVD, 
congestive heart failure and cerebrovascular accident, in subse- 
quent reports. 

Methods 

Literature Search 

Comprehensive searches of the MEDLINE and EMBASE 
databases from inception (1946 and 1974, respectively) to 
November 2010 for all available peer-reviewed literature were 
conducted by an experienced librarian (M-DW). Two search 
strategies were employed: (1) all studies where administrative data 
was used to identify CVD; (2) all studies reporting on the validity 
of administrative data for identifying CVD. Our MEDLINE and 
EMBASE search strategies are available as supplementary 
materials (Text SI and S2). To find additional articles, the 
authors hand-searched the reference lists of the key articles located 
through the database search. The Cited-By tools in PubMed and 
Google Scholar were also used to find relevant articles that had 
cited the articles located through the database search (up to 
February 2011). The titles and abstracts of each record were 
screened for relevance by two independent reviewers. No protocol 
for this systematic review has been published, though the review 
was conducted in accordance with the Preferred Reporting Items 
for Systematic Reviews and Meta-Analyses (PRISMA) Statement; 
our completed PRISMA checklist is provided as supplementary 
material (Checklist SI). More information about the CANRAD 
project is available here [13]. 

Inclusion Criteria 

We selected full-length peer-reviewed articles published in 
English that used administrative data and reported vahdation 
statistics for the International Classification of Diseases (ICD) 



codes of interest or provided sufficient data enabling us to calculate 
them. We included studies evaluating particular diagnostic codes 
for acute MI (being ICD-8 & ICD-9 code 410 and ICD- 10 codes 
I21&I22) and excluded studies that evaluated umbrella diagnoses. 
This means we did not include validity statistics from studies where 
other codes were included in the algorithm for MI (ie. 410—41 1 or 
410-414). For example, the MI statistics in one study [25] were 
not included because the algorithm included a code for cardiac 
arrest (ICD-9 427.5); those in three others [26-28] were not 
included because those algorithms contained codes for old MI 
(ICD-9 412 and ICD-10 code 125.2). Any discrepancies were 
discussed until consensus was reached. When the conflict persisted 
a third reviewer (JAA-Z) was consulted. 

Data Extraction 

The full text of each selected record was examined by two 
independent reviewers (NM and VB) who abstracted data using a 
standardized coUection form (a copy is provided in Text S3) 
developed for the CANRAD investigations. While extracting data, 
particular attention was given to the study population, adminis- 
trative data source, algorithm used to identify the CVDs, 
validation method and gold standard. Validation statistics 
comparing the MI codes listed above to deiniit(-, probable, or 
possible cases were abstracted. These statistics included sensitivity, 
specificity, positive predictive value (PPV), negative predictive 
value (NPV), and kappa scores. Because hospital separations 
typically contain multiple diagnoses, with the primary or principle 
diagnosis in the first position followed by one or more secondary 
diagnoses, we abstracted statistics for each of these positions, 
where available. Data were independendy abstracted by each 
reviewer, who subsequendy compared their forms to correct any 
errors and resolve discrepancies. 

The design and methods used by each study (for example, 
whether or not the diagnosis recorded in the administrative 
database formed part of the reference standard) can direcdy 
influence the validity statistics produced. Thus, all studies were 
evaluated for quality, and the vahdation statistics were stratified by 
level of study quafity. We used the Quality Assessment of 
Diagnostic Accuracy Studies (QjUADAS) tool [29] (available as a 
part of Text S3), used pri^viously by the CANRAD network in 
assessing the vahdity of codes for osteoporosis and fractures [30] . 
Briefly, it is a 1 4-item evidence-based quality assessment tool used 
in systematic reviews of diagnostic accuracy studies. Each item, 
phrased as a question, addresses one or more aspects of bias or 
applicability; however, there is no overall score. Instead, as done 
previously [30], items were independendy answered by each 
reviewer and used to qualitatively assess each study as High, 
Medium, or Low quality. Any disagremeents were resolved by 
consensus. 

Statistical Analysis 

All vahdation statistics were abstracted as reported. Where 
sufficient data were available we calculated 95% confidence 
intervals (95% CI) and additional validity statistics not directiy 
reported in the original publication. For each CVD these were 
evaluated on aggregate, and, as pre-specified, stratified by 
administrative data source (ie. hospitalization vs. vital statistics). 
Sensitivity (the ability of the codes to identify true positive cases) 
was equal to the number of true positives divided by the sum of 
true positives and false negatives (all those who are diseased). 
Specificity (the ability of the codes to exclude false-positive cases) 
was equal to the number of true negatives divided by the sum of 
true negatives and false positives (all those who are non-diseased). 
PPV (the likelihood that the code corresponds to a true-positive 
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case) was equal to the number of true positives divided by the total 
number of cases receiving the code (true-positives and false- 
positives). NPV (the likelihood that a record not coded for the 
condition is a true-negative case) was equal to the number of true 
negatives divided by the total number of cases without the code 
(true-ncgativc's and false-negatives). Kappa (a measure of agree- 
ment beyond that expected by chance) is equal to the observed 
agreement minus that expected by chance, divided by [100% - the 
agreement expected by chance] . Values greater than 0.60 indicate 
substantial/perfect agreement, 0.21-0.60 were considered as fair/ 
moderate agreement and those 0.20 or lower as Ught/poor 
agreement [31]. 

Where available, we abstracted statistics for definite, probable, 
and possible cases of MI. However the choice of gold standard 
dictates the number of categories reported, and some studies wiU 
classify cases simply as MI or no MI. Under the American Heart 
Association (AHA) [32] and Joint European Society of Cardiol- 
ogy/American College of Cardiology (ESC/ACC) criteria, true- 
positive cases are classified as either definite, probable, possible, or 
no MI. However, the MONICA criteria, used in the World Health 
Organization (WHO) 's Multinational MONItoring Trends and 
Determinants in CArdi<)\'as('ular Disease project, only uses three 
categories. Briefly, the MONICA project was conductc-d over 10 
years (during the 1980's and 1990's) across 32 study areas in 21 
countries to monitor trends in cardiovascular diseases and changes 
in risk factors [33]. As part of the study, all suspected coronary 
events in those aged 25-64 years were entered into a registry. 
Suspected events were identified prospectively (while cases were in 
hospital) and retrospectively (by examining hospital databases and 
death certificates), and study physicians used the MONICA 
criteria to classify these events as definite, possible or no MI [33]. 
The criteria considered symptoms, electrocardiogram (EKG) 
findings and cardiat: enzyme lev(;ls when making the diagnosis. 
'Definite' case's arc the most certain because they meet the strictest 
criteria for each CVD (enzyme levels and EKG in addition to 
typical symptoms) while 'Possible' cases include typical symptoms 
only [33]. Because more potential cases are expected to fialfil the 
broader criteria for 'Definite or Possible', the PPV for this broader 
category should be greater. However, this comes at a cost to 
specificity since more false-positives will meet these broader 
criteria too. 

Results 

Literature Search 

After the removal of duplicates, 1,587 citations were identified 
through MEDLINE and EMBASE searches and screened for 
relevance to our study objectives. We then assessed 98 fuU-text 
articles for eligibility (Figure 1), of which 22 were selected for 
inclusion. We also assessed 30 fuU-text articles for eligibility that 
were identified from other sources, and selected 8 additional 
articles therein. This meant a total of 128 articles were assessed for 
eligibility, from which 98 were excluded, mainly because they 
reported on the validity of other CVD (n = 41), or did not actually 
validate MI diagnoses in administrative data (n = 20). Six articles 
were excluded because they were not published in English; their 
languages of publication were Danish, German, Italian, Japanese, 
Portugese, and Spanish. Ultimately, 30 articles were included for 
the systematic review of MI. 

Study Characteristics 

Of the 30 studies evaluating MI diagnoses that were included in 
the final review, 12 (40%) were from Europe, 8 (27%) were from 
the United States (USA), 7 (23%) were from Canada, 2 (7%) were 



from New Zealand, and 1 (3%) was from Australia. Characteristics 
of these studies are presented in Table 1. Validation was the 
primary research objective in 26 (87"/o) of them. Altogether data 
were collected over a 34-year period (1970 to 2003) that covered 
three revisions of the ICD system (ICD-8, ICD-9, and ICD-10). 
Nearly all administrative data sources pertained to hospitalizations 
with algorithms consisting of ICD diagnostic codes but no 
procedure codes. Five studies evaluated the validity of MI as a 
cause-of-death on death certificates, but none of the studies 
evaluated diagnoses for outpatient encounters. National and 
regional disease registries and surveillance systems served as the 
gold standards in 10 (33%) studies [20,21,34-41]. In the 20 
remaining studies, the gold standards were based on chart reviews, 
often in consultation with established diagnostic criteria. Just two 
studies [42,43] reported on the validity of ICD-10 codes separately 
from ICD-9 codes. 

Study quality was evaluated based on the QUAD AS tool [29], 
with 26 of 30 studies (87%) categorized as high quahty, and four 
(13%) as medium quality. A detailed breakdown of the evaluations 
for each study is provided in Table SI. In one of the medium- 
quality studies [44] the validation process was not adequately 
described, while the gold standard in another [45] was considered 
less-reliable because charts of potential MI cases were not 
evaluated by a clinician. The two other medium-quality studies 
employed a select source population - male smokers aged 50-69 
years in one [46], and those aged 65 years or older in another [47] 
- which limited their generalizability. 

PPV data were available from all but one study [39] while the 
kappa statistic was reported in only two studies [2 1 ,48] . Sensitivity, 
specificity, and NPV were less-frequently reported by authors, but 
sufficient data to allow calculation of these statistics were often 
available and included when the source population was sufficiendy 
broad (ie. when it was not confined to cases receiving codes of 
ICD-9 410-414, which correspond to a more general category of 
coronary heart diseases that includes MI). 

Validity of IVlyocardial Infarction Diagnoses 

The validation statistics reported by each of the included studies 
are provided in Table 2. Sensitivity was reported by 12 studies, 
and was at least 86% in half of them. PPV, obtained from 29 
studies, was S93% in the majority (n = 15) of them. Specificity and 
NPV were available only from three studies [22,40,48] and in 
these ranged from 89-99%, and 75-99%, respectively. Five 
studies [34—36,43,45] provided sex-stratified statistics and in four 
of these [35,36,43,45] sensitivity and PPV values were higher for 
males (Table 2). Twenty-six of the 30 studies on MI (87%) were of 
high quality and the PPV was >80% in 20 of 25 (80% of the high- 
quality studies). One high-quality study [39] did not report PPV. 
One of the medium-quality studies reported a PPV of 81% [44], 
while in the three others [45-47] this value ranged from 95-98%. 
None of the medium-quality studies reported on sensitivity, 
specificity, NPV, or kappa. 

In order to examine secular trends in the validity of MI codes, 
the studies in Tables 2 and 3 have been ordered chronologically by 
publication year. Half of the MI studies were published between 
1984 and 1998, and die other half from 1999 to 2010. No clear 
trends in sensitivit}' were observed amongst the twelve studies 
reporting this statistic. However, at least amongst studies providing 
statistics on hospitalization data, we did observe somewhat of a 
trend towards higher PPV's in later years: the PPV was S89% in 
eight of the ten most-recent studies (from 2002 to 2010) while only 
four out of the 10 earliest studies (from 1984 to 1995) reported 
PPVa89%. Of interest, Rosamond et al [36] analysed the validity 
of MI diagnoses recorded from 1987 to 2000, with no secular 
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2,173 records identified tfirougfi 
database searching 
(IVIEDLINE, ElVIBASE) 




30 additional records identified 
tlirough other sources 








1,617 records after duplicates removed 



1,617 records screened 



1,489 records excluded 
after reading abstract 



128 full-text articles assessed 
for eligibility 



98 full-text articles excluded: 



LU 



1 . Pertained only to congestive heart failure or 

cerebrovascular accident {n=41) 

2. Did not validate administrative data or report 

validity statistics (n=20) 
3. Did not report specific ICD codes or validated other 
codes (e.g. Read codes) (n=10) 
4. No full text available/published as conference 
abstract (n=7) 
5. Extra codes included in search algorithm or did not 
evaluate Ml specifically (n=7) 
6. Duplicate records (n=7) 
7. Full-text not available in English (n=6) 



30 studies included in the qualitative 
systematic review of Ml diagnoses: 

-22 identified through database searching 
-8 identified through other sources 



Figure 1. Preferred Reporting Items for Systematic Reviews and iVIeta-Analyses (PRISMA)-style Flowchart of Study Selection and 
Review. ICD = International Classification of Diseases; Ml = myocardial infarction. 
doi:1 0.1 371 /journal.pone.0092286.g001 



trends overall in sensitivity or PPV. We were unable to directly 
evaluate any secular trends in specificity or NPV as there were 
very few studies (n = 3) reporting these statistics. 

As expected, there was also some variability in results with 
regards to the selection of gold standard and specific diagnostic 
criteria. The MONICA criteria, described above, were used in 1 2 
studies [20,21,34,35,37-39,41,46,49-51], and the sensitivity and 
PPV in these was lower than in studies using the current criteria. 
For example, the reported sensitivity of ICD 410 for detecting 
cases of definite or possible MI using the MONICA was 43% [20] 
in one study and ranged from 56-72% [34] in another. However, 
the PPV was noticeably higher (94—95% in the primary or 
secondary admission position) [47] in one article where levels of an 
additional biomarker of cardiac damage, troponin, were consid- 
ered in addition to the standard MONICA criteria. In one study 
comparing the PPV's associated with two gold standards, the PPV 
for definite MI was 86% using American Heart Association (AHA) 
criteria but only 53% using MONICA criteria [49]. Finally, while 



it wasn't consistent across all studies using the MONICA criteria, 
the PPV's were generally higher in those that were part of an 
actual MONICA registry [20,21,34,35,37,38,41] than in other 
investigations that simply used the MONICA criteria to evaluate 
potential cases of MI [46,49-51]. 

The PPV values from studies that reported on hospitalization 
data and incorporated a formal set of diagnostic criteria in their 
gold standard are plotted in Figure 2. The studies are ordered 
chronologically by year of publication. Figure 2a contains the 
estimates pertaining to the stricter parameter of "Definite MI", 
and Figure 2b contains the estimates pertaining to the broader 
parameter of "Definite or Probable or Possible MI", as estimates 
for these two parameters cannot be directly compared. If no 
parameter was specified in the study (ie. the MI code was 
compared to a diagnosis of simply "myocardial infarction"), we 
include that estimate in both figures. To allow for visual inspection 
of the impact of cardiac troponin measurement on the PPV of MI 
diagnoses, the PPV's in Figure 2 are colour-coded as to whether 
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or not levels of cardiac troponin were included in the diagnostic 
criteria. 

We also stratified results by geographic regions (Europe, the 
South Pacific (AustraUa and New Zealand), Canada, and the 
USA), and there was little difiFerence in the sensitivity values 
reported in each region (Table 2). Similarly, there were few 
difiFerences in the PPV's from diflFerent regions; this value was 
>80% in most of the Canadian and US studies, and ^89% in all 
1 1 European studies reporting this statistic. However, the PPV's in 
the three studies from the South Pacific were comparatively lower, 
with values ranging from 49 [38] -82% [20]. 

In most studies [&50%] providing hospital statistics, PPV values 
were &93%, but the accuracy of MI as a cause-of-death on death 
certificates was much lower. For example, the PPV for definite MI 
amongst these studies was <60% (Table 3), while in many of the 
studies from hospitalization databases the PPV for definite MI was 
>86% when using the strictest category. 

Discussion 

To our knowledge this is the first systematic review on the 
validity of MI diagnoses in administrative data. Overall, MI 
diagnostic codes from hospitalization data appear to be valid: in 
more than half of the studies, sensitivity and specificity exceeded 
83%, and PPV exceeded 92%. Therefore, we believe hospitaliza- 
tion data can be used to identify MI either as a covariate or as an 
outcome. The accuracy of MI as a cause of death on death 
certificates was lower, with the highest PPV for defmite fatal MI 
being 59% amongst the studies included. In comparison, the PPV 
was greater than 59% in three-quarters of the studies reporting on 
hospitalization data. Accordingly, caution should be taken when 
using vital statistics data to identify deaths from MI, and authors 
are encouraged to acknowledge this limitation. 

It is possible that our findings on the accuracy of MI diagnoses 
were unduly influenced by publication bias or selective outcome 
reporting, wherein some authors who did assess the validity of MI 
codes in their study may have chosen not to report the statistics if 
they were low. But while our findings for MI in hospitalization 
data were generally positive, there were exceptions. For example, 
we observed that the accuracy of MI diagnoses was heavily 
influenced by the gold standard employed, with lower statistics 
when the previously-used, more conservative MONICA criteria 
[52] were applied. These criteria, developed in the 1970's and 80's 
from international standards, differ from more recent criteria with 
regards to the biomarkers of cardiac damage. The creatine kinase, 
lactate dehydrogenase, and aspartate transaminase enzymes are 
part of MONICA [33], used by 12 studies in this review 
[20,21,34,35,37-39,41,46,49-51]. Three studies [43,49,53] used 
the 2003 American Heart Association (AHA) criteria, which 
consider levels of cardiac troponin [32] - a component of cardiac 
muscle and a more sensitive and speciflc indicator of myocardial 
damage [54] - in addition to creatine kinase. Similarly, in the Joint 
European Society of Cardiology/ American College of Cardiology 
(ESC/ACC) criteria [55] - used in two studies [42,53] - troponin 
levels take precedence over creatine kinase, and neither aspartate 
transaminase nor lactate dehydrogenase (the two other enzymes 
from MONICA) are considered markers of cardiac damage [56] . 

Support for the increased sensitivity of cardiac troponin is 
provided by many clinical and population-based studies [57-59] 
where more cases of MI were detected when applying the new 
criteria than when the MONICA. Consistent with this, some 
authors have shown that, when defined by the older criteria, the 
incidence of MI appears to have declined over the decades, but 
when the newer criteria are applied, the incidence appears to have 
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Table 3. Results of studies validating diagnoses of myocardial infarction (Ml) as a cause-of-death (COD) in vital statistics data (in 
ascending order of publication). 





First Author, Year 


Diagnostic Codes 


Parameter 


Sensitivity (95% CI) 


PPV (95% CI) 


Quality 


Jackson, 1988 [38] 


ICD9 410 


definite Ml 


84.03 (78.61-88.32) 


48.66 (43.74-53.60) 


High 


Boyle, 1995 [20] 


ICD9 410 


definite Ml 


79.85 (75.49-83.62) 


25.56 (23.18-28.11) 


High 






definite or possible Ml 


72.66 (70.09-75.09) 


73.71 (71.15-76.12) 




Rapola, 1997 [46] 


ICD8 and ICD9 410 


definite Ml 




45.31 (36.58-54.33) 


Medium 






definite or possible Ml 




95.31 (89.64-98.08) 




De Henauw, 1 998 [39] 


ICD9 410 


definite or possible as underlying COD 


49.13 (46.72-51.56) 




High 


Mahonen, 1999 [34] 


ICD8 410 


definite Ml 


91.08 (88.99-92.81) 


54.95 (52.39-57.48) 


High 






definite or possible Ml 


72.37 (70.35-74.30) 


96.74 (95.68-97.56) 






ICD9 410 


definite Ml 


89.28 (87.21-91.06) 


55.64 (53.22-58.03) 








definite or possible Ml 


67.27 (65.36-69.13) 


97.56 (96.67-98.22) 





COD = cause-of-death. 

doi:l 0.1 371 /journal.pone.0092286.t003 



remained steady [60] or even increased [61]. In other words, more 
cases wUl be classified as MI under the newer criteria than the old. 
Thus, given the increased sensitivity of the newer criteria, we 
expected to see greater sensitivity values amongst the more 
recently-published studies in this review, but we did not observe a 
trend in either direction. Amongst the ten studies reporting on the 
sensitivity of MI diagnoses in hospital data, sensitivity in the five 
earlier studies ranged from 80-94%, whUe in the five later studies 
it ranged similarly from 69-93%. This may simply be due to the 
comparatively small number of studies where sensitivity was 
reported, though heterogeneity in the study settings may also play 
a role. One study included in our review, by Rosamond et al [36], 
evaluated the sensitivity and PPV of ICD-9 410 over the period 
1987—2000. They reported that while overall, these statistics 
remained relatively stable, amongst teaching hospitals they 
declined significantly (with sensitivity declining from 74% to 
59%, and PPV from 80% to 71%). In contrast, in a study 
conducted at a university hospital in the Netherlands, both 
sensitivity and PPV were higher in the later period (years 1996- 
2003) than the earlier period 1987-1995 (with sensitivity 
increasing from 82% to 85%, and PPV from 94% to 99%) [62]. 

In addition to being more sensitive, cardiac troponin is also a 
more specific indicator of MI. Although few studies in this review 
reported specificity values directly, this statistic can be analysed by 
way of PPV. Specificity is equal to 1 - the number of false positives, 
so will increase as the number of false-positive cases decreases. 
PPV is the proportion of true-positives amongst all true-positive 
and false-positive cases, so will also increase as the number of false- 
positive cases decreases. The fact that the PPV's for hospitalization 
data generally increased over time provides support for an increase 
in the specificity of MI diagnoses as well. 

When comparing the performance of the newer diagnostic 
criteria to the MONICA, the contribution of other secular changes 
must be considered. One factor is the use of different revisions of 
the ICD coding system in different time periods. Mahonen et al 
[35] found that the sensitivity of ICD 410 was generally lower 
during the period 1987-1990 (ICD-9) than 1983-1986 (ICD-8), 
even though the same diagnostic critera (FINMONICA, a Finnish 
adaptation of the MONICA criteria) were used throught the study 
period. In contrast, those authors found that the PPV's in the ICD- 
9 period were generally higher than in the ICD-8 period. 



However, the impact that cardiac troponin testing has on the 
validity of MI diagnoses is difficult to ignore. For example, 
Pajunen et al [43] reported higher sensitivity during the ICD- 10 
period (1998-2002) than the ICD-9 period (1988-1997), but the 
authors attribute this difference to the use of cardiac troponin 
testing during the ICD- 10 period. We believe the introduction of 
cardiac troponin testing and its increasing use over time may be 
mainly responsible for the improvements we observed in the PPV 
of MI codes over time. 

When examining only studies that used the MONICA criteria, 
we observed that the PPV's were usually higher in studies 
stemming from the original MONICA project compared to those 
just applying the MONICA criteria in other samples. This was 
especially apparent amongst the European studies from the 
MONICA project. One explanation for this may be some cross- 
referencing between the hospital databases and MONICA 
registries. It is acknowledged in these studies [21,35] how the 
MONICA project itself may have influenced local coding 
practices. For example, some of the same physicians that were 
involved with the MONICA study were also treating patients 
hospitalized for coronary events in local centres. However the 
potential influence these factors may have had in Europe, they did 
not appear to carry over in Australia and New Zealand, where the 
PPV's in studies using the MONICA registries were much lower. 

We observed that the accuracy of MI as a cause of death on 
death certificates was lower in comparison to hospitalization data. 
Death certificate diagnoses of MI may be less accurate because less 
information is available on these cases from which to determine a 
precise cause of death. Specifically, many deaths are not attended 
to by medical personnel, resulting in a lack of comprehensive 
documentation [39]. In support of this, Lowel et al [41] found that 
the PPV's were lower for cases who spent less time in hospital, and 
had less clinical data and test results (including electrocardiograms 
and enzyme levels) available, which could otherwise aide in 
establishing a more accurate cause of death [41]. 

Our review showed that the accuracy of hospitalization data for 
identifying MI cases is much higher than data from death 
certificates; consequently, we recommend that, when available, 
researchers attempt to confirm the cause of death by matching 
vital statistics death records for MI with administrative hospital- 
ization data. At the very least, the limitations of vital statistics data 
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Figure 2. Positive Predictive Values of IVIyocardial infarction Diagnoses (versus "Definite" or "Definite/Probable/Possible IMI", or 
parameters unspecified). The positive predictive values (PPV's) and 95% confidence intervals (where reported) from studies that validated 
myocardial infarction (Ml) diagnoses in hospitalization data, and included a formal set of diagnostic criteria in the reference standard, are ordered left- 
to-right by publication year of the study (with the earliest-published study on the far left). The PPV's are also stratified by whether cardiac troponin 
testing was incorporated in the diagnostic criteria. Illustrated in Panel A are the PPV's calculated when the coded diagnoses were compared to the 
stricter parameter of "Definite Ml", and the PPV's for which no parameter was specified. Illustrated in Panel B are the PPV's calculated when the coded 
diagnoses were compared to the broader parameter of "Definite and Probable or Possible Ml", along with the same PPV's in Panel A for which no 
parameter was specified. 
doi:1 0.1 371 /journal.pone.0092286.g002 



should be acknowledged by these authors. 

Many of the findings presented in this paper are based on PPV, 
which was the most frequently-reported statistic amongst the 
studies included in this review. PPV is relatively easy for 
researchers to assess since they only need to evaluate cases who 
initially test positive for the condition (here being MI). However, a 
caveat of both PPV and NPV are their dependence on the 
prevalence of the condition in the study population [63]. The PPV 
wiU be lower for a rare condition than for a common condition. 
For example, amongst all testing positive in a rare condition (those 
in the denominator), few are likely to be true-positives (and appear 
in the numerator). In this review, we expected the PPV's to be 
lower amongst the community-based studies than the clinic-based 
studies or those with otherwise more selected populations, and this 
was apparent in several studies. For instance, the PPV in a study of 
patients admitted to coronary care units was 89% [48] and in two 
studies that were restricted to individuals aged 65 years and older 
(amongst whom MI is more common) the PPV's were 95% [47] 
and 98 % [45] . In contrast, in another study which had a younger 
source population (aged between 25 and 64 years), the PPV was 
much lower (only 67%) [37]. Consequendy, differences in the 
expected prevalence of MI in the different source populations may 
have contributed to variation in the PPV's reported by the 
different studies in this review. 

A significant research gap was identified in the course of this 
review, being a lack of studies reporting on the vahdity of codes 
from the ICD-10. This system has been in widespread use in 
Europe and Austrcdia for at least a decade, but ICD-10 codes were 
evaluated in just three studies included in this review, and only two 
of these [42,43] reported on the validity of ICD-10 codes 
separately from ICD-9 codes. One of thc-sc studies reported that 
the PPV for ICD-10 121-22 was good, especially in tertiary care 
hospitals (PPV = 93%) [42], and findings from the other suggest 
that ICD-10 121-22 is more sensitive for MI than the equivalent 
ICD-9 code, 410 [43]. With ICD-10 codes now a key component 
of health research, assessments of the validity of ICD-9 codes are 
quickly losing their relevance, and clearly, more investigations into 
the accuracy of ICD-10 codes are needed to support ongoing 
research endeavours. 

Our systematic review has some limitations. We could not 
consider articles whose full-text was not available in English, and 
this may have introduced a language bias. We were unable to 
include articles that did not report or reference the diagnostic 
algorithms being validated, or those that were published after the 
conclusion of our search period (February 201 1). As well, although 
our MEDLINE and EMBASE searches were conducted by an 
experienced librarian, some relevant studies may have been missed 
since administrative databases are not well catalogued in these 
indexes (e.g. no MeSH term pertaining to "'administrative 
database"). Most of the articles included in this review were 
located through database searches. In these, we searched for 
articles that were indexed under terms relating to Administrative 
Data, Vahdation, and Cardiovascular Disease. However, in our 



subsequent handsearch we located several relevant articles that 
were not indexed under these Administrative Data or Vahdation 
categories. Thus, while our handsearches were extensive, it is 
possible that we still missed some relevant articles if they were not 
indexed in tlu- databases with a term relating to validation or 
administrati\ c' data, or were published in a journal not indexed in 
the MEDLINE or EMBASE databases. 

In summary we conclude that, based on the evidence, 
hospitalization data can be used to identify MI as a covariate or 
outcome, but the accuracy of MI as a cause-of-death in vital 
statistics data is limited. Authors using vital statistics data to 
identify MI deaths are encouraged to compare such data with 
hospitalization data to confirm the cause of death or use sensitivity 
analyses excluding cases from this source. While most adminis- 
trative databases are not established for research purposes, they 
are increasingly being used to study long-term patient outcomes 
and disease burden. Therefore, in order to maximize the sensitivity 
of these databases, physicians and hospital coders should be 
encouraged to record all significant complications and comorbid- 
ities. In the m(-antime, authors using administrative data to 
identify Ml dc-aths should acknowledge the limitations of this data 
source. Finally, with ICD-10 coding now commonplace, more 
assessments of the validity of ICD-10 codes for MI are needed to 
ensure the quality of future research. We believe our findings will 
help to increase the rigour of population-based epidemiological 
and outcomes research and thus potentially improve health 
surveillance, resource allocation and patient care. 
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